home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Tech Arsenal 1
/
Tech Arsenal (Arsenal Computer).ISO
/
tek-04
/
bps210.zip
/
TUTOR.ASC
< prev
next >
Wrap
Text File
|
1991-07-25
|
26KB
|
638 lines
Back Error Propagation Simulator (BPS)
Brief Tutorial
by
Emilio A. Apey
Copyright George Mason University
September 1989
Table of contents
1. Introduction ......................................................................... 3
2. Walk-through an example .............................................. 3
Figure 1 ................................................................................ 4
Figure 2 ................................................................................ 5
3. Main menu ............................................................................. 7
3.1 Build/load network ............................................... 8
3.2 Specify training set ............................................. 10
3.3 Edit and view menu ............................................. 10
3.4 Save network ........................................................... 12
3.5 Find error .................................................................. 12
3.6 Number of epochs .................................................. 12
3.7 Cycle until converges ......................................... 13
3.8 Adjust learning parameters ............................ 13
3.9 Production menu ................................................... 14
3.10 Learn .......................................................................... 15
3.11 Transfer function options menu .................... 16
4. Formulas .............................................................................. 17
1) Introduction
The BPS program simulates a multilayer neural network using back
error propagation as learning algorithm. This simulator is intended for
educational purposes and depends heavily on user interaction. It is
important for the user to be familiar with the back error propagation
algorithm before using BPS; in this way, the user can understand what
the simulator is doing.
In this brief "tutorial" most of the features of BPS will be
demonstrated by building a simple example. Figure 1, overview of BPS,
can be very useful in becoming familiar with BPS.
Also, it is highly recommended for the user to draw the topology of
the network before using the simulator, indicating the layers numbers
and the unit numbers. This will be helpful when the program asks for
information about layers and units. To have an idea of a network
topology built by BPS, see figure 2.
2) Walk-through an example
A simple example is to teach a network the XOR (exclusive OR)
mapping, which is the following:
input output
----- ------
0 0 0
0 1 1
1 0 1
1 1 0
For this example there are four "training patterns," one for each
mapping that the network has to learn. Each training pattern consist of
an "input vector" and an "output vector."
The first thing to do, before entering BPS, is to write a "training set
file" and a "productions file" using an editor. The format of the training
set file is very flexible; the restrictions are that an input vector has to
precede its corresponding output vector, there must be at least one
blank character separating each number, and there can be no blank lines.
Figure 1: overview of BPS
Now, for BPS to give meaningful results the user has to be careful that
the dimension of the input vectors matches the number of units in the
input layer of the network, and that the dimension of the output vectors
matches the number of units in the output layer. If they do not match,
BPS will seem to be working normally, but the results will be
meaningless. Here are three training set file formats that could be
used in this example:
0.0 0.0 0.1 0.0 0.0 0.0
0.0 1.0 0.9 0.1 0.0
1.0 0.0 0.9 0.0 1.0 0.1
1.0 1.0 0.1 0.9 0.0
1.0 0.0 1.0
0.9 0.9
1.0 1.0 1.0
0.1 0.0
0.9
1.0
1.0
0.1
Format 1 Format 2 Format 3
The productions file is similar to the training set file, but the
output vectors are omitted; the network has to produce them. As with
the training set file, the format is flexible, but there can be no blank
lines.
0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
1.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0
1.0 1.0
0.0 0.0
1.0 1.0
1.0 1.0
Format 1 Format 2 Format 3
For the XOR example the training set file will be named XOR.DAT and
the productions file will be name XOR.PRO.
From the training patterns it can be seen that the network needs to
have two input units (the dimension of an input vector) and one output
unit (the dimension of an output vector). The number of hidden layers
and hidden units is up to the user; it is good practice to keep networks
as small as possible because it takes less time to train them, since
there are fewer weights to update. However, if they are too small they
might not be able to learn all the training patterns; this is one of the
artistic aspects of neural networks. The simplest topology that can
learn the XOR mapping is the following:
Figure 3: XOR network topology
Now it is time to get the program running and start doing something
with this network. To run the program in the VAX 8530 type BPS. If
you have the PC version, select the executable version that best fit
your computer (bps286.exe, bps8088.exe) and type the name without the
extension.
gmuvax> bps
Once the initial message appears, hit return to enter the main menu.
3) Main menu
In this menu is where the user spends most time, here networks are
defined and then they are trained. Under this menu there are three
other menus: the production menu, the edit and view menu, and the
transfer function menu; they will be described later on.
3.1) Build/load network (b)
The first thing to do when using BPS is to define a network, because
many functions depend on the network topology. To define a network
choose "b" (for build) and answer the questions. Remember to hit the
return key after each entry.
Supposed that you have been training a network for a while and then
decide to create or load a new network. BPS will ask if you want to
save the old network. Then, BPS will ask if you want to load a network
from your working directory:
Do you want to load a network from file? (y/n): n
To load a network answer "y"; then BPS will ask you for the name of
a previously defined network. The network is loaded and BPS returns to
the main menu.
In the case that you want to define a new network, then answer "n".
Next BPS will ask if you want to initialize the network with random
weights.
Do you want random weights? (y/n): y
When the answer is "y", BPS will completely connect all the units of
adjacent layers and the initial weights of the edges will be in the range
that you will soon specify. When the answer is "n", BPS will not
connect any unit in the network at all; it will create the layers and
units, but there will not be any edges. To connect the units you would
go to the Edit and View menu and add edges between the units that you
want to be connected. So, at this point, BPS would create only a
skeleton of the network.
In the XOR example two edges will be added later on, an edge going
from the first unit in the first layer to the first unit in the third layer,
and an edge going from the second unit in the first layer to the first
unit in the third layer (see fig. 3). But the rest of the network is fully
connected, so at this point answer "y", you do want edges with random
weights between the layers.
Then BPS will ask you for the lower and upper limit of the random
weights. A good range is -0.34 and 0.34, that is -1/3 and +1/3.
Give lower limit of random interval: -0.34
Give upper limit of random interval: 0.34
To initialize the weights with numbers close to zero is good
practice because if the weights are large it is very difficult,
sometimes impossible, for BPS to change those weights.
Next, BPS will ask you how many layers of units you want. A typical
back propagation network has three layers, but if you want to
experiment with more layers BPS allows you to build networks with
more layers. From figure 3 you can see that the network has three
layers, then answer "3".
Enter number of layers: 3
Now the number of units for each layer has to be specified. BPS will
start asking the number of units for the first layer (input layer) up to
the layer number specified in the total number of layers (output layer).
Be careful that the number of units in the input layer and output layer
matches the dimensions of the input and output vectors specified in the
training set.
Number of nodes for layer no. 1: 2
Number of nodes for layer no. 2: 1
Number of nodes for layer no. 3: 1
Since two units are needed for the first layer type "2"; for the
second layer one unit is needed, so type "1"; and for the third layer one
unit is needed; type "1".
BPS now returns to the main menu and it has given the default name
of "unnamed" to the network. By now the network looks like this:
Figure 4: XOR partial network topology
3.2) Specify training set (t)
To specify or change the training set, choose "t" (for training set)
from the main menu. BPS will ask you for the name of the file where
you wrote the training patterns.
Enter the file name of the training set: xor.dat
3.3) Edit and View menu (e)
When you choose "e" (for edit and view) in the main menu the "edit
and view menu" appears. In this menu you can inspect the network and
modify it. The options on the left of the menu are to modify the
network and the options on the right of the menu are to inspect the
network.
Choose "a" (for activities and thresholds) to see the activities and
thresholds of the units in the network. The screen looks like this:
Layer Node Activity Threshold
1 1 ----> 0.000000 0.000000
1 2 ----> 0.000000 0.000000
2 1 ----> 0.000000 0.000000
3 1 ----> 0.000000 0.000000
From this screen you can see that the first layer has two units (1,2),
the second layer has one unit, and the third layer has one unit. Since
this network was just defined, the activities and thresholds of the
units are all zeros.
Now choose "w" to inspect the weights in the edges; this is one of
the most used options in BPS. The screen will look like this:
Layer Node Weight Node Layer
1 1 -0.20023 1 2
1 2 0.23874 1 2
2 1 0.18973 1 3
The layers and units on the left are the source units, and the layers
and nodes on the right are the target units. From this screen you can
see the weights between two units.
The XOR network needs two more edges, so choose "e" (for edge) to
add a new edge. Edges in the network have direction; that is they come
out of one unit and they go into another unit; to add an edge you have to
specify the source unit and the target unit. Since BPS handles
multilayer networks, each unit location is defined by the number of the
layer where the unit is and the unit's number in that layer; see figure 3
for layer and unit numbers. The following entries are to add an edge
from the first unit in the first layer to the first unit in the third layer:
Layer number of source unit: 1
Number of source unit: 1
Layer number of target unit: 3
Number of target unit: 1
Do you want a random weight? (y/n): y
Change random weight range? (y/n): n
The given weight was: -0.15652
As you can see, BPS will ask you if you want a random weight; if you
answer "n" BPS will ask you for a specific weight that you want to give
to that edge. If you decide to add a random weight, you can change the
range for the new weight.
Now that a new edge has been added, you can, if you want, check it
by choosing "w" (for weights) and see if the new edge is really there.
To add the second edge from the second unit in the first layer to the
first unit in the third layer choose "e" again; and respond to the
prompts, but this time the unit number of the source unit is 2. After
you have done this you can check the network by choosing "w" again. At
this point the network topology should look like figure 3 above.
The remaining options in the edit and view menu work similarly.
When modifying the network BPS will ask you about the location of the
node(s) involved, information that you can get from a diagram of the
network that you are building, like figure 3. The "t" (for training set)
choice is to see the training set that you specified without having to
leave BPS.
Now you can leave this menu by choosing "q"; this will put you back
in the main menu.
3.4) Save network (s)
At this point that the network is complete and you may want to save
it. When a network is saved a name is given to the network; also, all
the parameters that the network has at that time are saved, including
the number of epochs that it has been trained. In this way, when a
network is loaded, all the parameters are loaded into BPS at the same
time.
Name for file to save network: xor211.net
The user can give any name to the network; XOR211.NET describes
the network as with two input units, one hidden unit, and one output
unit.
3.5) Find error (f)
If you want to know what is the current value of the network error
function, choose "f" (for find error) from the main menu. The new error
will be displayed in the main menu where it says "error." If you do not
choose "f" and keep training the network, the error displayed in the
main menu is not the present error, but it is the error when you
requested it the last time; this operation was designed to be user
activated.
Initially the error will be zero, but this does not means that the
error of the network is zero; this is just an initialization.
3.6) Number of epochs (n)
This number has two different purposes, depending on which learning
mode you choose to train the network. One of these modes is to let the
learning algorithm to cycle until a specified error is reached; see next
section for the other mode. "Cycle until converges" means that the
network will be train until an user specified error is reached. In case
that the user specified error can not be reached, BPS will exhaust the
number of epochs specified when "n" was chosen. In other words, the
network will be train until either it reaches an error or it uses all the
specified epochs.
When you choose "n", BPS will ask you for the number of epochs to
train the network.
Enter the number of epochs to train the network: 1000
With what we have specified up to now we could attempt to train
this network, but let's see the other options first.
3.7) Cycle until converges (c)
When "c" is choose, you can turn the "cycle until converges" mode ON
or OFF. If you turn it ON, then BPS will ask you for the target error; if
you turn it OFF, BPS will return to the main menu.
The other mode of learning is active when "cycle until converges" is
OFF. It will train the network for the specified number of epochs,
without paying attention to the error of the network.
For the XOR example leave "cycle until converges" OFF; so by now you
do not need to choose "c".
3.8) Adjust learning parameters (a)
This is the fun part of BPS, adjusting the learning parameters and
see with which set of parameters the network learns faster, or if it
learns at all. The back propagation algorithm uses two parameters to
adjust the weights in a network: eta (learning rate) and alpha
(momentum factor); see section 4. BPS has a speed up mechanism that
adjusts these two parameters after each epoch; this mechanism
introduces two new variables: an upper bound for eta and an increasing
rate for eta (see section 4).
These four variables can be set by the user. They default to
conservative values, so most networks will slowly learn the patterns.
For some topologies a larger bound for eta, 1.0, will allow the network
to learn faster; instead for other topologies, a large eat would lead
them to instability and their weights will become very large. By
experience it has been found that if a network's weights after a few
thousand epochs have become large (above 10) and the error is still
large, probably that network will not learn the mapping. However, if
after a few thousand epochs the network has not learn the mapping but
its weights are still small (below 2), there is still a chance for the
network to learn the mapping.
It is good practice to save a network as soon as you define it. In
this way, if the network does not learn with a set of learning
parameters, you can load the original network and try another set of
learning parameters; this allows you to start from the same initial
state. If the network still does not learn, them you can try another
initial state (another set of random weights) by creating a new
network. And if the network still does not learn, then you can try
another topology: add units or edges.
As you can see, there is some testing to be done with these
networks; but the more experiments that you do, the more familiar you
will become with their behavior.
When you choose "a" (for adjust learning parameters), you can change
the values of these four variables: eta, alpha, bound of eta, and
increasing rate of eta. For the XOR problem the default values will be
used; so once again, you do not need to choose "a".
3.9) Production menu (p)
Before training the network let's see what kind of output it is
producing. To do this choose "p" (for production menu) from the main
menu. In the production menu you can do productions (propagate input
vectors through the network to obtain output vectors) from a file or
from the keyboard. You can also save those productions in a file. All
these possibilities are shown in this menu.
To do productions from the keyboard choose "k"; BPS will give you
the number of input units (dimension of input vector) and ask you for
the input vector:
Input vector: 0.0 0.0
Output vector: 0.52343
The terms of the input vector need to have a space in between, you
can also hit return and give more terms in the next line. The output at
this time will be around 0.5 because the network has not been trained
yet. Try other input vectors and you will see that all of the output
vectors are around 0.5.
Sometimes you want to give a large set of productions to the
network; that is why the production file was created, because you will
get tired of typing input vectors. So you can specify a production file
where you have the set of productions that you want to do. To set the
production file choose "p" and BPS will ask you:
Enter file name: xor.pro
BPS will remember this file, so you do not have to specify it every
time that you enter the production menu. Sometimes you may want to
have several productions file for the same network; to change the
production file just choose "p" again and give the name of the new file.
The "v" (for view) option is to view the production file, to check if it is
the right set of productions that you want to test.
After you have specified a production file you can choose "f" (for
productions from file) and get many productions at once.
Suppose that you want to save productions when you do them either
from a file or from the keyboard. You can tell BPS to open a file to
write the productions. To do this choose "s" (for save); this choice is
used to open and close a file. If you want to open a file respond "y" and
BPS will ask you:
Save productions in a file? (y/n): y
Enter file name to save productions: xor.out
If you respond "n" and a saving file was open, BPS will close the file.
However, you do not need to close this file specifically; if you leave the
production menu and a saving file was open, BPS will close it for you.
Now you know everything about the production menu, so you can
choose "q" to return to the main menu.
3.10) Learn (l)
Finally the network is ready to start the training. For this choose "l"
(for learning) and depending on the state of the "cycle until converges"
switch (see section 3.6), the learning will be in either of the two
modes. Since we left this switch in OFF for this example, the learning
will exhaust the specified number of epochs; the display while the
network is being trained will look like this:
20 eta: 0.06 error: 0.32012345
40 eta: 0.10 error: 0.32012201
60 eta: 0.45 error: 0.31999982
80 eta: 0.60 error: 0.31999873
. . .
. . .
From this display you can see how the epoch number increases, how
eta changes and (hopefully) how the error decreases.
Probably for the first 1000 epochs the error will stay around 0.32,
and may be up to 3000 epochs. But when the error starts decreasing it
goes very fast. If the learning rate is high (around 0.95), the error will
decrease up to a point and then stay there; if the learning rate is too
high (above 1.0), the error will decrease rapidly up to a point and then
it will start increasing. You may even get a running time error,
"floating point excepting"; this is because the weights became
extremely large. You can train the network for a while and then go to
the "edit and view" menu to see how the weights have changed. Try
training the same initial network with several learning parameters and
see what kind of results you obtain.
These exercises will help you to get an idea of how the back
propagation algorithm works so you can do bigger applications
afterwards.
3.11) Transfer function options menu (m)
You enter this menu by choosing "m" from the main menu. This menu
has two options: to change the amplitude of the sigmoid function (M)
and to shift the sigmoid function downward so it is symmetrical around
the x-axis.
Figure 5: Sigmoid function with range 0.0, M and -M/2, M/2
By default the range of the sigmoid function is between 0.0 and 1.0.
When you select "m" in this menu BPS will ask you for the new upper
limit of the sigmoid function:
Enter new value: 2.0
Now the range of the sigmoid function is [0.0,2.0] instead of being in
the range [0.0,1.0]. Note that to be able to train a network with this
new amplitude, you have to define the training patterns to be in this
range also.
The other option is to make the sigmoid function symmetrical with
respect to the x-axis; this option can be switched ON and OFF by
choosing "s" in this menu. Note that the range of the sigmoid function
will be between -M/2 and M/2. For example, if you specify M = 2.0, then
the range of the sigmoid function would be -1.0 and 1.0.
4) Formulas
To give you more insight into what BPS is doing while a network is
being trained, this section describes some of the main formulas of the
learning algorithm.
In these formulas p stands for training patterns. Then the
summation over p means one epoch; all the training patterns have to go
through. Symbols for the units are j, i, and k. The j is for the unit
which parameters are being calculated; the i is for units in the
previous layer, and the k is for units in the next layer.
Figure 6: units and subscripts
Weight adjustment:
╞Wji(t+1) = h ╖p (dpj Opi) + a ╞Wji(t) (1)
where
dpj = (Tpj - Opj) (Opj/M) (1 - Opj/M) for output units.
dpj = (Opj/M) (1 - Opj/M) ╖k (dpk Wkj) for hidden units.
O : unit activity level.
T : output unit target.
M : sigmoid function amplitude.
A difference between formula 1 and the original back propagation
formula is the summation of the products of the deltas and previous
unit activation over all the training patterns, p. By doing this operation
the weights will be updated only once per epoch of training. This is the
first part of a two parts method proposed by Vogl to accelerate
learning by back propagation.
Transfer function:
1
Opj = M ( ------------------------ - 1/2 )
- ( ╖i Wji Opi + qj) / M
1 + e
Error function:
error = ╖p [1/2 ╖j (Tpj - Opj)**2 ]
Eta and alpha adjustment:
C (t) = << ╞Wji(t-1) , ╞Wji(t) >> Correlation.
if ( C(t) < 0 ) then
h --> 0.01
a --> 0.00
else
if (C(t) < (C(t-1) + 0.05 * C(t-1)) or h < eta bound) then
h --> h + eta rate
a --> a + 0.01
endif
endif
The correlation gives an indication of how the weights are changing.
If this term becomes negative, then eta and alpha are decreased
drastically to 0.01 and 0.0 respectively. If the correlation is positive,
then eta and alpha have passed the first test to be increased. The
second test to increase eta and alpha is that if eta is below its
maximum allowed value (its bound) and if the correlation is within the
last correlation plus five percent.
The original correlation test was implemented by David Schreibman
and consisted in checking if C(t) was either positive of negative; if it
was negative then the values of eta and alpha were dropped to low
values, and if it was positive eta and delta were increased. The
additional test of checking if the correlation is increasing or
decreasing was added to give more stability to the learning.